Leveraging Paraphrase Labels to Extract Synonyms from Twitter

نویسندگان

  • Maria Antoniak
  • Eric Bell
  • Fei Xia
چکیده

We present an approach for automatically learning synonyms from a corpus of paraphrased tweets. The synonyms are learned by using shallow parse chunks to create candidate synonyms and their context windows, and the synonyms are substituted back into a paraphrase detection system that uses machine translation metrics as features for a classifier. We find a 2.29% improvement in F1 when we train and test on the paraphrase training set, demonstrating the importance of discovering high quality synonyms. We also find 9.8% better coverage of the paraphrase corpus using our synonyms rather than larger, existing synonym resources, demonstrating the power of extracting synonyms that are representative of the topics in the test set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ECNU: Leveraging Word Embeddings to Boost Performance for Paraphrase in Twitter

This paper describes our approaches to paraphrase recognition in Twitter organized as task 1 in Semantic Evaluation 2015. Lots of approaches have been proposed to address the paraphrasing task on conventional texts ( surveyed in (Madnani and Dorr, 2010)). In this work we examined the effectiveness of various linguistic features proposed in traditional paraphrasing task on informal texts, (i.e.,...

متن کامل

Extract Domain-specific Paraphrase from Monolingual Corpus for Automatic Evaluation of Machine Translation

Paraphrase can help match synonyms or match phrases with the same or similar meaning, thus it plays an important role in automatic evaluation of machine translation. The traditional approaches extract paraphrase in general domain from bilingual corpus. Because the WMT16 metrics task consists of three subtasks, namely news domain, medical domain, and IT domain, we propose to extract domainspecif...

متن کامل

Paraphrase Alignment for Synonym Evidence Discovery

We describe a new unsupervised approach for synonymy discovery by aligning paraphrases in monolingual domain corpora. For that purpose, we identify phrasal terms that convey most of the concepts within domains and adapt a methodology for the automatic extraction and alignment of paraphrases to identify paraphrase casts from which valid synonyms are discovered. Results performed on two different...

متن کامل

Using Hashtags as Labels for Supervised Learning of Emotions in Twitter Messages

Many college students experience depression or anxiety but do not seek help due to the social stigma associated with psychological counseling services. Automatic techniques to classify social media messages based on the emotions they express can assist in the early detection of students in need of counseling. Supervised machine learning methods yield accurate results but require training datase...

متن کامل

Extracting Lexically Divergent Paraphrases from Twitter

We present MULTIP (Multi-instance Learning Paraphrase Model), a new model suited to identify paraphrases within the short messages on Twitter. We jointly model paraphrase relations between word and sentence pairs and assume only sentence-level annotations during learning. Using this principled latent variable model alone, we achieve the performance competitive with a state-of-the-art method whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015